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Software  Aids  for  Optimizing  0-1  Matrices 
R.  Michael  Perry 


Abstract 

Here  we  consider  the  problem,  given  a  real  m-vector  b  and  an  integer 
n  <  m,  of  finding  an  mxn  matrix  A  such  that  the  least-squares  residual  norm, 
\\b-Ax\\  where  x  =  {ATA)~xATb,  is  minimized,  subject  to  the  constraint  that  the 
entries  of  A  must  be  0’s  or  l's.  This  problem  has  arisen  in  a  study  of  the  reten¬ 
tion  of  information  from  visual  and  verbal  sources.  Mathematically  it  is  likely  to 
be  a  difficult  problem,  however,  and  thus  only  "good"  not  optimal  solutions  are 
expected.  A  software  package  has  been  written  to  assist  a  human  operator  in 
searching  for  such  desirable  matrices.  This  is  described  here,  and  its  use  in  the 
study  is  reviewed 


1.  Introduction 

In  the  standard  linear  least-squares  problem  we  are  given  an  Tri¬ 
dimensional  vector  6  and  an  mxn  matrix  A.  with  n  <  m,  and  asked  to  find  an 
n-dimensional  vector  x  such  that  Ax  approximates  6  in  a  least-squares  sense, 
that  is.  so  that  the  Euclidean  2-norm  residual,  ||b-Ax||,  is  minimized.  The  solu¬ 
tion  vector  x  is  given  (ignoring  possible  conditioning  problems  with  matrix  A)  by 
x  =  (ATA)~lArb  .  The  resulting  residual  norm  ||6-Ar||  will  be  denoted  by  r(b, A). 
It  should  be  noted  that  computation  of  the  solution  x  and  the  residual  r(b.A) 
are  straightforward  and  can  be  performed  reasonably  efficiently,  that  is,  in  time 
that  is  a  fairly  small  polynomial  in  the  parameters  to  and  n  that  determine  the 
size  of  the  problem. 

Here  we  will  consider  the  problem,  given  a  real  m-vector  b  and  an  integer 
n  <  m.  of  finding  an  mxn  matrix  A  such  that  r[b,A)  is  minimized,  subject  to 
the  constraint  that  the  entries  of  A  must  be  0's  or  l's.  This  problem  has  arisen 
in  a  psychological  study  of  the  retention  of  information  from  visual  and  verbal 
sources  [1  ];  a  further  discussion  is  given  later.  Mathematically  the  problem  is 
known  to  be  NP- complete  in  some  forms  [2].  Thus,  unlike  the  least-squares 
problem,  no  efficient  algorithms  for  solving  it  are  known  for  most  cases  of 
interest  and  quite  possibly  none  exist.  That  is,  although  the  problem  could  be 
solved  by  exhaustively  considering  all  the  2mn  possible  0-1  matrices  with  dimen¬ 
sions  mxn,  it  is  doubtful  if  there  exists  any  algorithm  that  always  finds  a  best 
matrix  in  time  that  is  polynomial  in  m  and  n.  In  practical  terms  the  optimiza¬ 
tion  problem  is  likely  to  be  unsolvable  even  for  small  values  of  to  and  n  (say,  for 
mn  <  100). 

Often,  however,  one  is  not  interested  in  an  optimal  matrix  exclusively,  but 
would  simply  like  to  find  the  best  matrix  possible,  that  is,  the  one  with  the  smal¬ 
lest  residual,  subject  to  a  reasonable  limit  on  the  time  spent  searching.  Com¬ 
plex  strategies  using  more  basic  heuristics  may  prove  useful  Thus  user  interac¬ 
tion  is  desirable  so  that  the  search  can  be  guided  by  intelligent  decision-making. 

In  the  work  described  here,  several  software  tools  were  created  for  con¬ 
structing  and  modifying  0-1  matrices.  Among  these  are  heuristics  to  find  a 
matrix  with  low  residual  norm  or  to  modify  a  previously  found  matrix  to  reduce 
the  residual.  Most  of  the  procedures  are  straightforward  and  are  not  described 
in  detail  but  are  mentioned  in  the  section  on  implementation.  The  section  on 
algorithms  that  follows  will  deal  with  the  two  major  heuristics  that  are  used, 
both  of  which  were  suggested  by  Andrzej  Ehrenfeucht. 
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2.  Algorithms 

There  are  two  important  algorithms  used  in  generating  low-residual 
matrices.  One  adds  a  "best"  column,  the  other  modifies  the  rows  to  reduce  the 
residual  Thus  to  use  these  heuristics  it  is  necessary  to  have  a  matrix  to  start 
with.  In  the  application  [l]  it  is  assumed  that  the  first  column  of  any  matrix  is 
filled  with  l's,  so  that  an  initial  matrix  is  always  given.  Strictly  speaking,  this 
changes  the  optimization  problem  from  what  was  defined  above,  since  the  first 
column  of  the  matrix  is  not  allowed  to  vary,  but  the  modified  problem  is  still 
likely  to  be  intractable,  so  that  similar  considerations  will  apply. 

To  add  a  "best”  column  we  start  with  an  m  x  (n  -l)  matrix  Ac,  assumed  to 
contain  a  "constant"  column  filled  with  l's,  and  look  at  the  residual  vector 
v  -  b  -  Aox0  obtained  by  subtracting  the  least-squares  approximation  AoX0  from 
b.  The  column  to  be  added  will  be  the  one  which,  when  linearly  combined  with 
the  constant  column,  secures  a  least-squares  fit  to  the  residual  vector.  Thus  in 
essence  the  column  sought  is  a  basis  function  which  secures  a  best  fit  when 
linearly  combined  with  a  constant  function.  Since  the  column  or  basis  function 
is  limited  in  its  values  to  0  and  1,  the  fitting  function  obtained  by  linear  combi¬ 
nation  must  also  be  two-valued,  though  in  this  case  the  values  can  be  arbitrary 
real  numbers  The  best  column  can  thus  be  determined  by  finding  the  best  two¬ 
valued  approximation  of  the  vector  v.  That  is,  if  /  is  a  two-valued  function  that 
secures  a  least-squares  fit  to  v  (over  the  set  of  all  two-valued  functions),  then 
the  desired  column  will  be  obtained  if  the  ith  entry  of  the  column  is  set  to  0 
whenever  /  (i)  is  the  smaller  value,  and  to  1  otherwise.  (Or  the  0  entries  could 
correspond  to  the  larger  values  /  (i)  and  the  l’s  to  the  smaller  values.  The 
existence  of  the  constant  column  allows  any  other  column  in  the  matrix  to  be 
complemented  without  affecting  the  residual  norm.) 

To  find  the  best-fitting  two-valued  function  / ,  we  first  sort  the  entries  of  v 
in  the  order  of  increasing  size.  This  will  greatly  simplify  the  problem,  and  the 
desired  solution  for  the  original  case  can  then  be  determined  by  a  straightfor¬ 
ward  unscrambling. 

Assuming  then  that  the  entries  i/*  of  v  are  sorted  as  indicated,  that  is,  so 
that  Vj>vx  whenever  j^i,  it  remains  to  determine  a  best-fitting  function  /t  This 
will  not  be  difficult  once  it  is  established  that  /  loo  can  be  an  increasing  func¬ 
tion.  i.e..  that  for  some  best-fitting  /,  J  (j )  ^  /  (i)  whenever  j  >i.  To  show  this 
in  turn,  suppose  that  /  is  a  function  such  that  /  (j )  <  /  (i)  for  some  j  >  i.  Then 
a  function  that  fits  just  as  well  can  be  defined  by  interchanging  the  values  of  / 
so  tha*  J  {])  is  assigned  to  i  and  /(i)  to  j .  In  other  words  we  claim  that  the 
discrepancy  in  fitting  for  the  new  function  is  nc  worse  than  that  for  the  old,  or 
that 


(Vi  ~  I  (;))*  +  ( vj  ~  /(O)2  <  (v*  -  /(i))2  +  (vj  -  f  {j))2.  (l) 

To  show  this  in  turn,  let  p-v^,  q-Vi.  s=/(i).  ThenjrSg,  r<s,  and  we 

claim  that 

(jj  — r  )z  +  (g  — s  )z  «  (p  — s  )z  +  (g  — r  )2.  (2) 

By  expanding  terms  the  above  inequality  is  equivalent  to 


pr  +gs  ^  ps+qr 


(3) 


Clearly  (2)  holds  when  p  =  r  because  for  this  case  p  is  <  both  q  and  s,  thus 
|  q  -s  |  ^  |p  -s  | ;  (3)  must  hold  also.  Next,  suppose  that  (2)  and  (3)  hold  for  some 
values  p.  q.r,  and  s.  Then  (3)  must  also  hold  forp,  q,  r+t,  s+t,  where  t  is  an 
arbitrary  real  constant,  as  can  be  seen  by  expanding  terms.  From  this  it  follows 
that  (2)  and  (3)  must  hold  for  arbitrary  p,  g,  r,  s  when  the  initial  inequalities 
are  satisfied.  From  this  in  turn  we  can  assume  without  loss  of  generality  that 
the  function  /  is  increasing. 

Since,  on  the  other  hand,  /  has  only  two  values  it  must  have  the  form 
/  (j)  =  r  whenever  1  <,  j  <zi  for  some  i  <m  and  /  (J)  =  s  for  i  <  j  ^m.  (Here 
we  assume  m.  2:  2;  also  note  that  the  case  that  v  is  constant,  which  will  only 
occur  if  v  =0  in  view  of  the  constant  column,  is  handled  by  allowing  r=s=0.)  The 
best  choice  for  /  will  be  one  that  minimizes  the  discrepancy  with  v,  which  in 
turn  is  given  by 


t 

/= t 


(vj  ~T  )Z 


£  <*,-)*. 

J=i  +  1 


(4) 


For  a  given  value  i,  the  best  choices  of  r  and  s  are  respectively  the  means  of  Vj 
over  the  intervals  1  ^  j  <.i  and  i  + 1  ^  j  •&  m ;  thus 


s  =  2 

lj  =  1 


(5) 


The  best  choice  for  / .  then,  is  found  by  selecting  the  value  i  that  achieves  the 
minimum  discrepancy  according  to  (4),  using  the  values  for  r  and  s  given  in  (5). 
By  expansion  of  terms  the  expression  to  be  minimized  becomes 


2*/  -  £  vi )*. 

1  1  j-\  TTL-^l 


(C) 


which  is  convenient  for  compulation. 

The  best  /  can  then  be  decoded  as  indicated  earlier  (including  unscram¬ 
bling)  to  obtain  a  best  column  to  add  to  matrix  A0 .  In  this  way  a  matrix  A  can  be 
built  up  column  by  column.  Although  the  matrix  will  not  in  general  be  optimal 
the  heuristic  has  proved  highly  useful,  particularly  when  combined  with  other 
heuristics,  the  most  important  of  which  will  now  be  described. 

This  heuristic  modifies  the  rows  of  A  in  an  attempt  to  find  a  better  matrix. 
Initially  we  are  given  vector  b ,  matrix  A,  and  the  least-squares  solution  vector  x . 
Each  entry  6i  of  6  is  approximated  by  taking  the  inner  product  of  the  ith  row  of 
A  and  the  vector  x.  A  better  matrix  will  result  if  this  ith  row  is  replaced  by 
another  row  whose  inner  product  with  x  gives  a  better  approximation  to  bi. 
Rows  of  A  can  be  modified  independently  of  each  other  to  find  improvements  in 
this  way  The  result  (assuming  some  improvements  are  found)  will  be  a  matrix 
A'  such  that  ||6-/4’x||  <  \]b-Ax\\.  x,  however,  will  not  in  general  be  the  least- 
squares  solution  vector  for  A',  this  latter  vector,  call  it  x\  must  then  be  com¬ 
puted  and  will  give  a  still  better  fit  to  6 .  Th*  heuristic  can  then  be  reapplied  to 
the  rows  of  A'  using  the  new  vector  x'. 

In  practice  the  new  rows  are  found  simply  by  exhaustive  searching.  In  par¬ 
ticular,  since  the  m  rows  of  A  can  be  modified  independently  of  each  other, 
there  are  only  m  2"  combinations  that  must  be  considered  for  the  most  general 
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case.  rather  than  all  the  2mn  possible  matrices.  Thus  if  n,  the  number  of 
columns,  is  not  too  large  the  rows  can  be  searched  exhaustively  to  find  the  best 
A'.  (This  of  course  will  not  guarantee  an  optimal  matrix  but  like  the  other 
heuristic  it  has  proved  useful.)  In  practice  it  has  generally  been  desirable  to 
limit  the  columns  to  be  modified  in  searching  for  the  best  rows;  for  example  a 
constant  column  has  usually  remained  fixed,  By  suitably  limiting  the  columns  in 
this  way,  search  times  can  be  kept  reasonable  even  when  n  is  large. 


3.  Implementation 

A  software  package  has  been  created  to  assist  a  human  operator  in  search¬ 
ing  for  0-1  matrices  with  small  residual  norms.  A  number  of  user-defined  con¬ 
straints  can  be  imposed  on  the  matrices  that  are  to  be  round.  Thus  a  variety  of 
problems  can  be  defined  and  the  search  can  be  guided  by  intelligent  decision¬ 
making. 

To  begin,  then,  a  b -vector  is  written  to  a  specified  file;  this  vector,  of 
course,  will  remain  fixed  during  computation.  The  user  will  then  attempt  to  find 
a  0-1  matrix  A  in  the  appropriate  form  having  low  residual  r(b,A).  To  assist  this 
process  there  are  (l)  routines  for  defining  A  directly,  that  is,  by  setting  entries 
individually  or  by  such  operations  as  redefining  specified  rows  or  columns,  (2)  a 
procedure  for  determining  the  least-squares  vector  z,  given  b  and  A,  and  (3) 
heuristics  which  automatically  modify  or  add  columns  to  A  or  which  suggest 
other  possible  improvements.  The  software  is  extensively  documented  and 
should  be  usable  without  difficulty.  The  various  components  will  now  be 
described  briefly.  The  documentation  should  be  consulted  for  further  details. 

Matrix  A  must  be  written  to  a  specified  file  so  that  the  least-squares  vector 
z  and  the  residual  r(b,A)  can  be  determined.  (At  present  this  file  is  fixed,  as 
are  those  for  the  6 -vector  and  for  other  information  that  may  be  needed,  though 
this  could  easily  be  changed.)  The  matrix  entries  can  be  keyed  in  directly  using 
one  of  the  system's  editors.  In  addition  there  is  a  "modify”  routine  in  which 
instructions  added  to  the  matrix  file  are  executed  to  change  the  entries  The 
available  instructions  include  "delete",  "replace”,  "insert",  "union”,  and  "comple¬ 
ment",  each  of  which  performs  the  corresponding  operation  involving  one  or 
more  columns  of  the  matrix.  ("Union"  replaces  a  specified  column  with  the  bit¬ 
wise  "or"  of  two  or  more  columns,  while  "complement"  takes  the  bitwise  comple¬ 
ment  of  one  column.)  In  addition  a  row  of  the  matrix  can  be  replaced  using  the 
"rcrow"  instruction.  Use  of  the  modify  routine  can  greatly  reduce  the  labor  (and 
error)  of  making  alterations  in  a  matrix  by  hand. 

When  the  b -vector  and  A-matrix  have  been  set  as  desired,  the  routine 
"solve"  can  be  called  to  determine  the  least-squares  fitting  vector  z.  This  is  cal¬ 
culated  by  a  straightforward  application  of  Cholcsky  decomposition  using  the 
normal-equations  matrix  A T  A.  thereby  obtaining  z  =  (ArA)~lA7  b  [3],  The  resi¬ 
dual  norm  r(b,A)  is  also  derived  Although  this  method  can  lead  to  conditioning 
problems  it  is  relatively  Tast  and  is  stable  in  the  cases  of  interest  to  date,  that  is, 
for  nonsingular  0-1  matrices  of  fairly  small  dimensions.  The  solve  routine  also 
notes  linearly  dependent  columns  for  a  singular  matrix 

Currently  there  arc  five  other  routines  that  solve  the  linear  least-squares 
problem,  "asolve",  "bsolvc",  "csolve",  "dsolve",  and  "nsolve”.  These,  however,  are 
used  mainly  to  find  a  better  matrix  A  with  a  smaller  residual  r(6,A).  asolve 
applies  the  first  heuristic  of  the  previous  section,  adding  a  best  column  to  A 
bsolve,  using  the  same  heuristic,  adds  columns  iteratively  until  r(6,A)  falls 
below  a  specified  tolerance,  csolve  modifies  the  rows  of  A  using  the  second 
heuristic,  printing  the  best  solution  and  also  alternate  versions  of  rows  that 
achieved  a  particularly  good  fit  dsolve  is  a  simplified  version  of  csolve  in  that 


only  two  versions  of  each  row  are  considered,  namely,  the  original  row  and  its 
complement.  However  it  prints  results  of  least-squares  solving  for  each  row 
when  complemented  individually.  These  results  are  sorted  "best-first",  that  is, 
in  the  order  of  increasing  residuals.  In  addition,  all  rows  which,  when  comple¬ 
mented  individually  gave  a  better  fit,  are  complemented  simultaneously  and  the 
resulting  solution  and  residual  are  printed,  nsolve  deletes  columns  of  the 
matrix  individually,  solves  the  least-squares  problem  for  each  resulting  matrix, 
and  sorts  the  results  best-first.  In  this  manner  columns  which  make  relatively 
little  contribution  to  the  fitting  can  be  identified  and  deleted,  giving  a  smaller 
matrix  with  nearly  the  same  residual. 

The  software  package  was  coded  in  Franz  LISP,  under  the  UNIX  operating 
system,  and  is  now  running  on  a  VAX  11/780  computer  at  the  University  of 
Colorado  Computer  Science  Department.  LISP  was  found  to  be  a  convenient 
language  for  coding,  particularly  for  the  modify  routine  and  for  such  features  as 
dynamic  allocation  of  array-*  It  should  be  noted,  however,  that  only  small 
matrices  have  been  considered  (typically  about  17x5)  so  that  execution 
efficiency  was  not  of  primary  concern.  Typically  about  30  sec.  was  required  for 
one  run  of  the  modify  routine,  with  3-10  min.  being  the  rule  for  one  of  the  solving 
routines 


4.  Application 

In  the  one  major  application  to  date  [l]  a  study  was  made  of  the  retention 
of  information  in  the  human  memory.  Subjects  were  shown  an  educational 
movie,  some  being  presented  with  both  narration  (or  written  text)  and  with  the 
visual  portion,  while  in  other  cases  the  verbal  or  visual  component  was  omitted. 
Other  subjects  were  given  verbal  information  followed  by  visual  presentation 
without  sound,  or  vice  versa.  The  subjects  then  were  tested  for  retention  of 
information,  (l)  immediately  and  (2)  after  a  one-week  delay.  In  this  manner  17 
test  scores  were  obtained  measuring  the  amount  of  information  retained  under 
varying  conditions  of  acquisition  and  testing  delay. 

The  next  step  was  to  find  a  sensible  explanation  of  these  results,  and  it 
seemed  natural  to  interpret  them  in  terms  of  features  that  were  either 
definitely  present  or  definitely  absent  in  each  of  the  17  subject  categories.  One 
obvious  feature  of  this  type,  for  example,  was  the  "visual"  one  that  was  present 
in  those  categories  in  which  the  visual  portion  of  the  movie  was  shown,  and 
absent  in  the  others. 

By  selecting  the  right  set  of  features,  then,  it  was  hoped  that  every  test 
score  would  be  accounted  for  by  the  features  present  or  absent  in  each  particu¬ 
lar  group.  That  is,  it  was  assumed  that  each  feature  would  contribute  a  specific 
numerical  amount  to  the  test  scores  of  all  categories  in  which  it  was  present, 
with  zero  contribution  if  absent.  Each  feature,  then,  would  be  assigned  a  value, 
positive  or  negative,  by  which  it  would  affect  a  test  score  if  present.  Ideally, 
then,  the  test  score  of  a  particular  category  would  be  exactly  reproduced  by 
adding  the  value.,  of  the  features  that  were  present.  This  would  require  an  apt 
choice  of  features  and  a  correct  assignment  of  values  as  well.  It  was  recognized, 
however,  that  there  should  be  some  toleration  of  discrepancies,  as  for  example, 
if  the  actual  and  calculated  scores  did  not  differ  by  a  statistically  significant 
amount. 

In  each  case  the  values  assigned  for  a  particular  choice  of  features  were 
those  that  achieved  a  least-squares  fit  to  the  test  scores.  The  problem  then  was 
to  find  a  "reasonable"  set  of  features  that  would  give  a  reasonable  least-squares 
fit  Mathematically,  then,  the  17  test  scores  formed  an  m -vector  6  with  m  =  17 
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while  a  set  or  n  features  comprised  an  mxn  0-1  matrix  A,  each  feature  contri¬ 
buting  one  column.  The  values  of  the  features  were  then  contained  in  the  n- 
vector  x  obtained  by  least-squares  fitting. 

A  reasonable  solution  of  the  problem  would  be  a  small  number  of  physically 
meaningful  features  that  gave  a  good  fit  to  the  1?  scores.  The  software  package 
described  in  this  report  was  used  in  searching  for  such  a  set  of  features,  and 
finally  five  features  were  chosen  that  satisfied  all  requirements.  (One  of  these 
was  the  "baseline"  or  constant  column  that  figured  in  the  previous  section.)  It  is 
important  to  note,  however,  that,  since  the  features  had  to  be  "physically  mean¬ 
ingful"  it  was  not  sufficient  to  simply  find  a  matrix  with  low  residual.  Instead 
there  were  further  constraints  that  were  difficult  to  delineate  mathematically. 
Thus  the  human  operator  was  crucial,  both  in  finding  solutions  and  in  rejecting 
those  that  were  unrealistic.  At  any  rate,  a  satisfactory  solution  was  eventually 
obtained,  and  the  paper's  conclusions  could  then  be  stated.  Among  these  was 
the  interesting  observation  that  "in  a  show  and  tell  presentation,  one  should  not 
tell  first  and  show  second". 

The  results,  then,  were  obtained  by  a  lengthy  interaction  of  person  and 
machine,  both  of  which  were  indispensable. 
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